Abstract: Data mining techniques make possible to analyze and discover knowledge of data sets. However, the tradition clustered data are not providing more accurate data for large datasets. Mahout support for implementing cluster algorithms by handling large volume of data in integration with hadoop. Using MapReduce programming model for processing the data cluster in distributed systems. To improve the performance of the large-scale datasets clustering on the single computer. To find the accuracy of data in K- Mean’s algorithm to calculate SSE value based upon Euclidean distance using MapReduce framework for 2dimension and 3 dimension datasets.
Keywords: Clustering, Hadoop, K-means, MapReduce, SSE (sum of square error).